Phytochemical profiling of soybean genotypes using GC-MS and UHPLC-DAD/MS

Soybean is one of the most economically important crops worldwide. However, soybean yield can be substantially decreased by many diseases. Soybean genotypes could have different reactions to pathogen infection. As a first step toward investigating the biochemical basis of soybean resistance and susceptibility to disease, phytochemicals in the seeds of 52 soybean genotypes previously reported to have different reactions to diseases of soybean rust (SBR), Phomopsis seed decay (PSD), and purple seed stain (PSS) were analyzed. Using GC-MS, a total of 46 compounds were tentatively identified which included 11 chemical groups. Among those, the major group was esters, followed by carboxylic acid, ketone, and sugar moieties. Compounds having reported antioxidant, anti-microbial, and anti-inflammatory activities were also identified. UHPLC-DAD/MS analysis indicated that there were five major isoflavone components presented in the samples, including daidzin, glycitin, genistin, malonyldaidzin, and malonylglycitin. Isoflavones have been reported to play an important role in defense from plant pathogens. Although there was variance in the isoflavone content among soybean genotypes, those with the SBR resistance Rpp6 gene (PI 567102B, PI 567104B, PI 567129) consistently exhibited the highest concentrations of daidzin, glycitin, genistin, and malonyldaidzin. The SBR resistant genotype, PI 230970 (Rpp2) had the greatest amount of genistin. The SBR resistant genotype, PI 200456 (Rpp5) resistant genotype uniquely contained glycitein, a compound that was absent in the other 51 genotypes examined. A PSD-resistant genotype PI 424324B had nearly four times the amount of stigmasterol as PI 556625, which was susceptible to SBR, PSD, and PSS in our previous tests. Results of this study provide useful information for further investigation of the biochemical basis of soybean resistance to diseases. The results may also aid in selection of soybean lines for breeding for resistance to soybean rust and other diseases.


Introduction
Soybean (Glycine max (L.) Merr.) is one of the most economically important crops in the world.Although soybean is native to Asia, it was introduced into North America, Europe, and later into South and Central America, and eventually to other parts of the world [1].World soybean production was 170 million metric tons (MMT) in 1960 and increased to 398.3 MMT in 2023 (https://ipad.fas.usda.gov/cropexplorer/cropview/commodityView.aspx?cropid=2222000).Soybean produced 70.86% of the global supply of plant-based protein meal and 28.88% of the plant-based oil in the 2020/2021 market year based on the Market View Data Base of the United Soybean Board [2].Soybean is considered essential for global food security [3].The seeds of soybean have 8.3 to 27.9% oil content and 34.1 to 56.8% protein content depending on the soybean varieties and cultivation conditions [4].Studies indicate that soybeans and soy-based foods offer a plethora of health advantages, in addition to being among the most abundant and cost-effective protein sources available [5].Soybean is ideal for human and animal nutrition [6], as well as for biodiesel production [7].
Global demand for soybean production has been significantly increasing.However, soybean production can be severely impacted by many diseases.Of more than 200 reported soybean pathogens, about 35 cause major economic impacts [8].Soybean genotypes can have different reactions to pathogen infection [9,10].It has been reported that soybean contains numerous bioactive phytochemicals, which include, but are not limited to, phenolic acids, flavonoids, isoflavones, saponins, phytosterols, and sphingolipids [11][12][13].Although soybean phytochemicals have a positive effect on the human immune system [14], their role in the soybean genotype's response to pathogens has not been well-studied.
As a first step toward investigating the biochemical basis of soybean resistance and susceptibility to disease, phytochemicals in the seeds of 52 soybean genotypes previously reported to have different reactions to diseases of soybean rust (SBR) caused by Phakopsora pachyrhizi, Phomopsis seed decay (PSD) caused by Diaporthe longicolla, and purple seed stain caused (PSS) by Cercospora spp.were analyzed and their potential value against their causal pathogens were explored.SBR is one of the most devasting foliar diseases causing yield loss of up to 90% [8].Both PSD and PSS are seed diseases reducing seed quality and seed lot grade for marketing purposes [8].
The specific objectives of this study were (i) to identify and classify/categorize compounds in soybean seeds using gas chromatography/mass spectrometry (GC-MS) analysis; (ii) to analyze isoflavones in soybean seeds using ultra-high performance liquid chromatography/diode-array detector-mass spectrometry (UHPLC/DAD-MS); and (iii) to profile phytochemicals among soybean genotypes.Overall, the results of this study provide useful information for further investigation of the biochemical basis of soybean resistance and susceptibility to disease.The results may also aid in selection of soybean genotypes for breeding for resistance to soybean diseases.

Plant materials
Seeds of fifty-two soybean genotypes originating from eight countries (Brazil, China, India, Indonesia, South Korea, Japan, USA, and Vietnam) were used in this study.The genotype's name, geographical origin, and references related to the genotype's reactions to three soybean pathogens (Phakopsora pachyrhizi, Diaporthe longicolla, Cercospora spp.) that cause soybean rust (SBR), Phomopsis seed decay (PSD), and purple seed stain (PSS), respectively are shown in Table 1.Seeds of these genotypes were obtained from the curators of the USDA-ARS GRIN (ars-grin.gov).

Sample extraction method
Eight isoflavones were quantified from a total of 52 soybean seed samples.Solid soybean seeds were ground and homogenized to obtain a uniform matrix.For GC-MS analysis, 5 mL methanol was added to 100 mg of soybean seed powder and sonicated at room temperature for 60 min.For LC/DAD-MS analysis, 1.5 mL of 70% ethanol (EtOH, v/v) aqueous containing 0.1% formic acid was added to 100 mg of soybean seed powder, sonicated at room temperature for  30 min, and then centrifuged at 14,000 r/min for 10 min.The supernatant was filtered.The whole process was repeated two additional times, and then the volume was fixed to 5 mL in a volumetric flask.Both of the extraction solutions for GC-MS and LC-DAD/MS analyses were stored at 4ºC.

Gas Chromatography-Mass Spectrometry (GC-MS) analysis
The methanolic extracts of 52 soybean seed genotypes were analyzed using an Agilent 7890 GC (Agilent Technologies, Santa Clara, CA, USA) equipped with a 7693 autosampler.Separation was achieved on an Agilent DB-5MS ultra inert column (60 m x 0.25 mm x 0.25 μm).The helium carrier gas was set to constant flow mode at 1 mL/min.The inlet was held at 260ºC and was operated in split mode with a split ratio of 50:1.The GC oven temperature was ramped from 80ºC at 3ºC/min to 125ºC, programmed at 1ºC/min to 140ºC, held for 10 min at 140ºC, then ramped at 3ºC/min to 170ºC, and finally ramped at 8ºC/min to 280ºC, where it was held for 10 min.Triplicate injections of each sample were made with a volume of 1 μL.The peak area percent was calculated as an average of the three injections.The presence/absence of each compound was defined based on the detection of the compound in the triplicate injections with a threshold of S/N > 3:1.The mass spectral detector was an Agilent 5977A quadrupole mass spectrometer operated in the full spectral acquisition mode.The mass spectrometer was equipped with an electron ionization source, which was operated with an electron voltage of 70 eV.The ion source, quadrupole, and transfer line temperatures were set to 230, 150, and 280ºC, respectively.Data were acquired using MassHunter Acquisition software (B.07006.2704).
Compound identification involved a comparison of the spectra with the NIST database (Version 2.2) using a probability-based matching algorithm.

Ultra-high performance liquid-diode array detector/mass spectrometry (UHPLC-DAD/MS) analysis
Analysis of the isoflavone components in soybean seeds was performed on a 1290 Infinity series UHPLC system equipped with a diode array detector, binary pump, autosampler, and thermostatted column compartment.Separation was achieved using an Agilent ZORBAX Eclipse Plus C 18 column (2.1 x 100 mm, 1.8 micron) maintained at 30ºC throughout the analysis.The mobile phase consisted of water (A) and acetonitrile (B) both containing 0.1% formic acid.The gradient elution was as follows: 0 min 10% B, 0-20 min 40% B, 20-25 min 100% B. A 5 min wash of 100% B followed each run, after which an equilibration period of 6 min with 10% B was completed.The eluent was pumped at a flow rate of 0.25 mL/min with the injection volume set at 2 μL.The DAD wavelength was set at 250 and 260 nm.The optimized wavelength for quantification of each compound was also evaluated [15] The mass spectrometric analysis used for compound identification and confirmation was performed with an Agilent 6120 quadrupole mass spectrometry equipped with an ESI source using the following parameters: drying gas (N 2 ) flow rate of 10 L/min and temperature 300ºC, nebulizer pressure 30 psi, sheath gas temperature 325ºC with a flow rate of 10 L/min, capillary voltage 3000 V, and fragmentor voltage 120 V.The data acquisition was controlled by Agilent MassHunter Acquisition Software (Ver.A.05.01) and data analysis was processed with Mas-sHunter Qualitative Analysis and MassHunter Quantitative Analysis Software (Ver.B.10.0).

GC-MS data analysis
The percent peak area data obtained from GC-MS analysis was combined into different groups of compounds and then exported to SIMCA-P + 13.0 software (Umetrics AB, Umeå, Sweden).With the variables in the dataset being Pareto scaled, principal component analysis (PCA) was performed.The identification with the highest probability score was taken for each compound.

GC-MS analysis
Using GC-MS analysis, a total of 46 compounds were identified based on NIST database search, retention time, and retention indices compared with literature data.A typical total ion chromatogram of one soybean genotype is shown in Fig 2 .The tentative compound identification and classification results given in Table 2 and S1 Table indicated that the methanolic extract was mainly comprised of 11 chemical groups, including esters, carboxylic acids, ketones, sugar moieties, heterocyclic compounds, and phenolic compounds.Compounds having reported antioxidant, anti-microbial, and anti-inflammatory activities were identified in Table 2.
Phenolic compounds have been reported to be significantly associated with anti-microbial and antioxidant activities [16,17].Two phenolic compounds, 4-vinylguaiacol and 2,3-dimethoxyphenol, were identified.4-Vinylguaiacol was found in three genotypes possessing soybean rust resistance gene: PI 417132 (Rpp3), PI 200487 and PI 200526 (Rpp5).Regarding another phenolic compound 2,3-dimethoxyphenol, it was detected in 19 soybean genotypes, which include 11 lines containing soybean rust resistance gene (either Rpp1, or Rpp3, Rpp5, Rpp6) and three lines susceptible to soybean rust, two lines resistant to Phomopsis seed decay and one line susceptible to Phomopsis seed decay (S1 Table ).How soybean genotypes with different levels of resistance or susceptibility related to the presence of those compounds is uncertain.It is worth noting that no phenolic compounds were identified from soybean genotype PI 518671 (Williams 82), which is a well-known susceptible genotype to soybean rust [18], Phomopsis seed decay [19,20], and purple seed stain [21].
Five carboxylic acids were identified in total.Notably, palmitic acid and oleic acid, both recognized for their anti-inflammatory properties [22,23], were detected in all genotypes.Linoleic acid, which is known to possess anti-hyperlipidemic effects [24] was also detected in all samples.Another compound, linolenic acid, which has been reported to have anti-inflammatory activity, was absent in samples SBR3-SBR6 (PI 594538A, PI 594538B, PI 230970, PI 567025A) and SBR9-SBR10 (PI 605854B and PI 605891A).A PSD-resistant genotype, SBR50 (PI 424324B) had nearly four times the amount (6.71 mg/g) of stigmasterol in dried soybean sample as SBR46 (PI 51867), which had 3.06 mg/g of stigmasterol.This soybean line was susceptible to SBR, PSD, and PSS in our previous tests [18][19][20][21].Another PSD resistant genotype, SBR50 (PI 549020), had 3.86 mg/g of stigmasterol.One unknown genotype, SBR41 (PI 471208), had the highest content of stigmasterol.Further study is needed to test if soybean line PI 471208 is resistant to PSD.

Chemometrics analysis using GC-MS data
In S1 Table, compounds present in various soybean samples were identified.Utilizing this information, we classified each compound into its respective group and aggregated them to produce S2 Table of Supplementary Material for principal component analysis (PCA).Findings from PCA revealed three principal components (PCs) as shown in Table 3 and Fig 3.These components collectively accounted for 89.87% of the total variance observed among the genotypes.For each main component, greater positive coefficients represented a significant factor.PC1 indicated 42.23% of total variance and was mainly composed of the phytochemical classes, carboxylic acid, sugar moiety, and ester.These classes exhibit a significant positive loading on PC1.Genotypes SBR53 (PI 549020), SBR12 (PI 417503), SBR47 (PI 556625), SBR24 (PI 506764), SBR43 (PI 567351B), and SBR51 (PI 458130) showed the most variability according to these components.PC2 illustrated 39.54% of the total variance and classes with higher scores were sugar moiety, aldehyde, and ketone.The genotypes that showed most variance based on this component were SBR47 (PI556625), SBR24 (PI 506764), SBR1 (PI 587880A), SBR8 (PI 567039), SBR10 (PI 605891A, and SBR11 (PI 605865B).PC3 accounted for 8.11% of the total variance, with ester, tocopherol, and heterocyclic compound exhibiting higher scores.Among genotypes, SBR42 (PI 606440A), SBR43 (PI 567351B), SBR16 (PI 606405), and SBR41 (PI 471208) displayed the greatest variability along this component.Based on these findings, it  can be inferred that the most divergent genotypes were SBR24 (PI 506764), SBR47 (PI556625), and SBR43 (PI 567351B), as they exhibited positive loading in at least two out of the three components.The effective utilization of PCA minimized the number of variables required for cultivar classification, thereby enabling soybean researchers to establish more meaningful relationships between key soybean characteristics.

UHPLC/DAD-MC analysis of isoflavone components
Five major isoflavone components, viz.daidzin (1), glycitin (2), genistin (3), malonyldaidzin  4. While there is variability in the isoflavone content among soybean genotypes, the top five highest concentrations of daidzin, glycitin, genistin, and malonyldaidzin were observed in all three soybean genotypes containing the soybean rust Rpp6 gene (PI 567102B, PI 567104B, PI 567129).In previous analyses of soybean leaves, an increase accumulation of the isoflavonoids genistein and daidzein occurred in leaves after inoculation with the soybean rust causal pathogen P. pachyrhizi [28].
Those two isoflavones could be the key phytochemicals contributing to the resistant responses of soybean lines with the Rpp6 resistant gene.diseases [25,26].Isoflavones have also been reported to mediate important interactions with plant-associated microbes, including defense from pathogens and in nodulation [27].Soybean rust (SBR) is one of the most important soybean foliar diseases occurring in many major soybean-producing countries.In a reported study on the importance of phenolic metabolism to limit the growth of P. pachyrhizi that causes soybean rust, it was found that inoculation of soybean plants with the pathogen resulted in increased accumulation of isoflavonoids and flavonoids in leaves of all soybean genotypes tested [28].Although the soybean phytoalexin glyceollin was not detected in leaves of uninfected plants, accumulation of this compound at marked levels occurred in rust-infected leaves [28].In another study testing a susceptible genotype PI 636463, significant production of defense secondary metabolites including phenylpropanoids, terpenoids and flavonoids were found when P. pachyrhizi infected soybean [29].Exploration of the relationship between isoflavone content and soybean genotypes with known resistance genes would help reveal the biochemical basis of soybean resistance to disease.Experiments are underway to analyze isoflavone content of soybean resistant and susceptible genotypes after pathogen (P.pachyrhizi and D. longicolla) inoculation and non-inoculation treatments.In this study, a conclusion could not be drawn about the direct correlations between the studied compounds and phenotypic traits illustrating the resistance/susceptibility to the pathogens.To address this issue, phytopathogenic experiments will be conducted to test selected representative genotypes, such as PI 518671 (Williams 82), which is a well-known susceptible genotype to soybean rust [18], Phomopsis seed decay [19,20], and purple seed stain [21], as well as the resistant line PI 567102B that contains the soybean rust Rpp6 gene [18].Soybean samples will be collected from replicated tests directly for the phytochemical analyses.
Together, information obtained from comprehensive phytochemical profiling of soybean genotypes with their reactions to pathogens will facilitate selection of soybean lines for breeding for resistance to diseases.

Conclusion
In this study, a comprehensive phytochemical profiling of soybean genotypes was conducted to explore various reactions to soybean diseases including soybean rust, Phomopsis seed decay, and purple seed stain using GC-MS and UHPLC/DAD-MS.The results revealed significant diversity in the isoflavone profiles among the genotypes studied.Notably, certain genotypes containing specific soybean rust resistance genes exhibited higher levels of particular isoflavone components compared to others.For example, genotypes carrying the soybean rust Rpp6 gene consistently contained elevated levels of daidzin, glycitin, genistin, and malonyldaidzin.Furthermore, genotypes harboring the Rpp2 (SBR5) gene had a distinct accumulation of genistin and malonyldaidzin, while one genotype possessing the Rpp5 gene (SBR29) exclusively contained the compound glycitein.These results highlight the potential correlation between soybean rust resistance genes and the biosynthesis of key phytochemicals in soybean, underscoring the importance of genetic factors in shaping the phytochemical composition of soybean varieties.Such insights could contribute to the development of disease-resistant soybean cultivars with enhanced nutritional and functional attributes.Overall, this study sheds light on the intricate interplay between genetics, phytochemistry, and disease resistance in soybeans, offering valuable implications for soybean breeding and agricultural practices.

Fig 2 .
Fig 2. A representative total ion chromatogram of soybean seed genotype (PI230970, SBR5).https://doi.org/10.1371/journal.pone.0308489.g002 , and malonylglycitin (5), were identified and quantified in this study.The structures of these compounds are shown in Fig 1. Chromatograms of the eight isoflavone reference standards and seed samples from different soybean genotypes are illustrated in Fig 4. Quantification results for each isoflavone component can be found in Table

Table 2 . Phytocompounds identified in the methanolic seed extract of 52 soybean genotypes by GC-MS.
https://doi.org/10.1371/journal.pone.0308489.t002 In our study, PI 230970 (Rpp2) exhibited the highest level of genistin.However, no genistin was detected in another reported soybean rust resistant genotype SBR 45 (PI 417125) possessing Rpp2 resistant gene.Pathogenicity experiments are needed to perform to determine if SBR 45 possess the Rpp2 resistant reaction after inoculating with P. pachyrhizi.In addition, SBR 29 (PI 200456), which carries the soybean rust Rpp5 resistance gene, contained 0.09 (mg/g) of glycitein, a compound that was absent in all other 51 tested genotypes including five other genotypes with Rpp 5 gene (Table1).The role of glycitein in the resistant response of the soybean PI 200456, however, is not known.Isoflavones are a group of phenolic compounds commonly found in the legume (Fabaceae) family.Soybean isoflavone is an important secondary metabolite accumulated in soybean.It has been reported that isoflavones contribute to overall human health, including chronic